A Bit-Compatible Shared Memory Parallelization for ILU(k) Preconditioning and a Bit-Compatible Generalization to Distributed Memory

نویسندگان

  • Xin Dong
  • Gene Cooperman
چکیده

ILU(k) is an important preconditioner widely used in many linear algebra solvers for sparse matrices. Unfortunately, there is still no highly scalable parallel ILU(k) algorithm. This paper presents the first such scalable algorithm. For example, the new algorithm achieves 50 times speedup with 80 nodes for general sparse matrices of dimension 160,000 that are diagonally dominant. The algorithm assumes that each node has sufficient memory to hold the matrix. The parallelism is task-oriented. We present experimental results for k = 1 and k = 2, which are the most commonly used cases in the practical applications. The results are presented for three platforms: a departmental cluster with Gigabit Ethernet; a high-performance cluster using an InfiniBand interconnect; and a simulation of a Grid computation with two or three participating sites.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bit-Compatible Parallelization for ILU(k) Preconditioning

ILU(k) is a commonly used preconditioner for iterative linear solvers for sparse, non-symmetric systems. It is often preferred for the sake of its stability. We present TPILU(k), the first efficiently parallelized ILU(k) preconditioner that maintains this important stability property. Even better, TPILU(k) preconditioning produces an answer that is bit-compatible with the sequential ILU(k) prec...

متن کامل

A Sub-threshold 9T SRAM Cell with High Write and Read ability with Bit Interleaving Capability

This paper proposes a new sub-threshold low power 9T static random-access memory (SRAM) cell compatible with bit interleaving structure in which the effective sizing adjustment of access transistors in write mode is provided  by isolating writing and reading paths. In the proposed cell, we consider a weak inverter to make better write mode operation. Moreover applying boosted word line feature ...

متن کامل

The same-source parallel MM5

With the March 1998 release of the Penn State University/NCAR Mesoscale Model (MM5), the official version of the model (MM5v2 Release 8) now runs on distributed memory (DM) message-passing platforms. Under an IBM-funded effort, source translation and runtime library support minimize the impact of parallelization on the original model source code with the result that the majority of code is line...

متن کامل

Parallel Multilevel Block ILU Preconditioning Techniques for Large Sparse Linear Systems

We present a class of parallel preconditioning strategies built on a multilevel block incomplete LU (ILU) factorization technique to solve large sparse linear systems on distributed memory parallel computers. The preconditioners are constructed by using the concept of block independent sets. Two algorithms for constructing block independent sets of a distributed sparse matrix are proposed. We c...

متن کامل

The Data Diffusion Space for Parallel Computing in Clusters

The data diffusion space (DDS) is an all-software shared address space for parallel computing on distributed memory platforms. It is an extra address space to that of each process running a parallel application under the SPMD (Single Program Multiple Data) model. The size of DDS can be up to 2 bytes, either on 32or on 64-bit architectures. Data laid on DDS diffuses, or migrates and replicates, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008